An efficient text analyzer with prosody generator-driven approach for Mandarin text-to-speech

نویسندگان

  • Shaw-Hwa Hwang
  • Cheng-Yu Yeh
چکیده

A new approach for an efficient text analyser is proposed. The prosody generator-driven method is employed to design an efficient text analyser for Mandarin text-to-speech. More simple structure of text analysis, more suitable classification of linguistic features and more efficient contribution of linguistic features to the prosody generator can be achieved. Three heuristic and theoretical methods are used to analyse and examine the capability of each linguistic feature. First, the contribution of each linguistic feature to the prosody generator is examined experimentally. Secondly, the cross-influence of each linguistic feature on the prosody generator is analysed. Thirdly, the problem of overand underclassification of the linguistic features is inspected. Finally, these three analytic results are referenced to design an efficient text analyser. In total 35 243 Chinese characters are employed to examine the performance of our text analyser. Only 79ms CPU time on a P4-1.4G PC is needed for word segmentation and POS tagging. Correction rates of 97.5 and 93.2% are achieved for the word segmentation and POS tagging, respectively. This confirms that the performance of our text analyser is very good. Moreover, a Mandarin text-to-speech system is implemented to inspect the performance of the text analysis and the contribution to the prosody generator. More natural and fluent speech is obtained under the lower computation. The MOS of prosody of the synthesised and original speech are 4.2 and 4.8, respectively, which is reasonably good.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Text-to-Speech Synthesis for Mandarin Chinese

A Text-To-Speech (TTS) synthesizer is a computer-based system that is able to automatically read text aloud, regardless whether the text is introduced by computer input stream or a scanned input that is submitted to an optical character recognition (OCR) engine. TTS synthesis can be used in many areas, such as telecommunication services, language education, vocal monitoring, multimedia, and as ...

متن کامل

Unsupervised prosody labeling for constructing Mandarin TTS

This paper introduces an unsupervised prosody labeling method for preparing a large speech corpus used in developing a Mandarin Text-to-Speech system. Adopting a four-layer prosody hierarchy, the proposed method performs an unsupervised segmental clustering that iteratively segments spoken utterances into strings of prosodic constituents and models the patterns of the segmented prosodic constit...

متن کامل

Evaluating prosody of Mandarin speech for language learning

This paper proposes an approach to automatically evaluate the prosody of Chinese Mandarin speech for language learning. In this approach, we grade the appropriateness of prosody of speech units according to a model speech corpus from a teacher’s voice. To this end, we build two models, which are the prosody model and the scoring model. The prosody model that is built from the teacher’s speech p...

متن کامل

A Statistical Model with Hierarchical Structure for Predicting Prosody in a Mandarin Text-to-speech System

In this paper we proposed a statistical prosody model with hierarchical structure for Mandarin Text-to-Speech (TTS) system. There are four levels in our model: syllable level, word level, breath group (prosodic phrase) level, and utterance level. Here “hierarchy” means that each lower level is a subset of a higher level. The prosodic information is first found in each level, and then they are c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003